Introduction

Economic inequality is a topic that has been discussed for decades, if not centuries. It seems to be an important indicator of the social development of societies.

The discussion on this topic has become popular in the mainstream media in the last ten to fifteen years, especially after the Great Recession of 2008-2009. The French economist has also helped popularize the topic as well as Bernie Sanders’ bid for the US presidency in 2016 and 2020. By now, it is well known that global economic inequality is immense, but what is the relationship with economic growth? Does economic growth increase or decrease inequality in OECD countries?

Method

The primary purpose of this project is to collect data on economic growth and inequality as measured by the Gini Index on OECD countries over time in order to look at both levels and trends, besides other data points are explored to see their relationship with inequality indicators. Finally, a regression analysis is performed in order to look at the whole picture.

To limit the sample of countries, I have chosen to analyze OECD member countries as of November 2020.

Data sources

I chose to use panel data from the World Bank DataBank to get the data over time in an open-access format for the variables that were of interest.

Gini Index (gini)

Using the Gini Index for inequality studies has both advantages and disadvantages; however, it is a measure that makes it easy to make comparisons between countries and over time in an easily digestible manner. It ranges from 0 to 1, where 0 would mean perfect equality. Thus, a higher Gini index means that the country’s income distribution is less equal.

The Gini Index used in my analysis is sourced from World Bank (n.d.c) as it has recent estimates for a large sample of OECD countries. The Gini Index can be calculated with different methodologies; therefore, it is not a good idea to mix data sources of calculated Gini Indexes. It is possible to do the calculation yourself, but finding the necessary data and performing the calculations for the chosen countries and years is outside the scope of this project.

Annual GDP growth (gdpg)

The annual growth rate of GDP is used to look at the relationship between economic growth and inequality. Does high growth periods increase or decrease inequality? The data is sourced from World Bank (n.d.a).

Life expectancy at birth (lifeexp)

Life expectancy (World Bank n.d.d) can be used to give a picture of the health of a population. It gives some insight into whether the population is fed, healthy and has adequate access to necessary means to live.

Mean years of education (education)

Another interesting data point to look at is how educated the population of a given country is. In this project, it is used as a proxy for human capital. When a population is well educated, it may lead to them being more productive and thus having a positive effect on economic growth. The data for mean years of education is sourced from (World Bank n.d.b), and I am using the data point “Mean years of schooling; Percentage of population (age 25+) by educational attainment” in my compiled dataset.

Data cleaning

Since most of the data was already in a machine-readable format, it was not necessary to perform extensive cleaning tasks. The data was consisting of panel data, so it was imported, and the necessary processing was done. I wrote a function in R for the importing tasks to minimize code duplication. It imports the data, selects the variables we are interested in by using filters. It then converts the variable into a numeric format so that we can use it for plotting and calculation. In contrast, date objects are formatted in preparation for plotting. The data is then converted into a long data frame for more straightforward plotting.

Finally, all the different data sets are merged into one long data frame that consists of all observations without removing missing values.

Visualization

Several plots were generated to convey the story of the project. Before performing the final regression analysis, several plots were created to inspect the relationship with inequality as represented by the Gini index. All the visualizations where the Gini index is used, it is plotted on the Y-axis for consitency. It is perhaps not the best when looking at variables such as GDP growth, but when plotting the panel data it is what shows the data in a clear manner.

Income inequality in OECD countries

Gini Index over time

Source: World Bank DataBank

The above plot allows us to inspect the evolution of the Gini index of a given country over time. By double-clicking on a country, we can isolate that country, clicking on a single country removes it from the view of all countries. By inspecting Norway, we see that the Gini Index is at the approximately same level in 2017 as it was in 1979, however, after the first data point a period of decreased inequality was seen before an increase started in 1986. There is a peak between 2003 and 2006, and the explanation for this peak warrants further inspection of the data.

For the United States, the story is different; here, we can see that since the first estimate in 1974 to the most recent in 2016, inequality has increased. In the plot below, I illustrate who has gained and lost from this change in the Gini index.

The evolution of the richest and the poorest

Source: World Bank DataBank

While global inequality is decreasing, many developed countries are experiencing an increase in inequality where the rich get richer, and the poor get poorer. This is illustrated by using panel data to plot Income share held by highest 10% over time against the Income share held by lowest 10%. By double-clicking on the United States, we can see its graph isolated. We can see that the income share held by the highest 10% has increased since the 1970s while the income share held by the bottom 10% has had a slight decrease. This is consistent with the increase in inequality as measured by the Gini Index.

Life expectancy and the Gini index

By looking at data from 2017, and plotting the Gini index and life expectancy together there seems to be a positive relationship between lower inequality and higher life expectancy.

Education and the Gini index

There is a similar story for education and inequality, although the positive relationship is more evident. The countries with the highest inequality are also the countries with the lowest mean years of education.

Results

Regression analysis

By using linear regression analysis, we can better test the relationship between the dependent and independent variable(s). First, I perform a single linear regression with gini as the dependent variable and gdpg as the independent variable to see the relationship between the two. Then we include the other variables discussed above in order to see if they have a more substantial effect and to minimize the chance of committing omitted variable bias. The results are summarized in the table below.

Results of regression analysis
Dependent variable:
gini
(1) (2)
gdpg 0.167* 0.062
p = 0.086 p = 0.471
education -1.948***
p = 0.000
lifeexp -0.531***
p = 0.00000
Constant 33.091*** 97.096***
p = 0.000 p = 0.000
Observations 557 326
R2 0.005 0.456
Adjusted R2 0.004 0.451
Residual Std. Error 7.312 (df = 555) 4.959 (df = 322)
F Statistic 2.968* (df = 1; 555) 90.046*** (df = 3; 322)
Note: p<0.1; p<0.05; p<0.01

The single variable regression is model (1) and the multiple variable model is (2). The first model has a low \(R^2\) of 0.005, which along with the P-value of 0.085 tells us that the relationship between gini and gdpg is not well explained by this model. Plotting the model we see that this perhaps can be caused by large variability in both variables in the data:

Regression plot of the single variable linear regression model

The multiple linear regression shows that there is a slightly positive relationship between gini and gdpg, and the \(R^2\) along with the p-value shows that perhaps this model is a better fit for our data. If we look at the other variables, namely, education and lifeexp both have a negative relationship with gini, meaning that increases in education and life expectancy for the sampled countries should mean a decrease in inequality as measured by the Gini Index.

The validity of these results warrants further testing. However, it is a bit outside my capabilities.

Conclusion

The relationship between economic growth and inequality is not clear. In this project, I attempted to get a better understanding of the relationship by using modern data science tools. It is a large topic that requires much more research to reach anything resembling an answer; however, this has been an exciting project for developing my data science skills.

Given the current situation in the world, many expect that inequality will increase in developed countries because of expected recessions following the COVID-19 pandemic. As we have seen, many OECD countries have had an increase in inequality over the past decade, and future research will have to determine if this trend continues or not.

References

OECD. 2014. “Growth and Inequality: A Close Relationship? - OECD.” 2014. http://www.oecd.org/economy/growth-and-inequality-close-relationship.htm.

Roser, Max. 2013. “Global Economic Inequality.” Our World in Data, November. https://ourworldindata.org/global-economic-inequality.

World Bank. n.d.a. “Annual GDP Growth.” World Development Indicators. Accessed November 10, 2020. https://databank.worldbank.org/reports.aspx?source=2&series=NY.GDP.PCAP.KD.ZG&country=#.

———. n.d.b. “Education Statistics: Education Attainment | DataBank.” Accessed December 1, 2020. https://databank.worldbank.org/source/education-statistics:-education-attainment.

———. n.d.c. “Gini Index (World Bank Estimate).” Accessed November 10, 2020. https://databank.worldbank.org/reports.aspx?source=2&series=SI.POV.GINI&country=#.

———. n.d.d. “Life Expectancy at Birth.” Accessed November 5, 2020. https://databank.worldbank.org/reports.aspx?source=2&series=SP.DYN.LE00.IN&country=.